Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 29, 2003 6:59am
Forum: etree Subject: Re: Batting Averages revisited

I'm curious why you think that an artist having few shows in the Archive would affect the batting average? The denominator is visits for that particular show, and the numerator is downloads for that particular show.

I'd love to hear more of your thoughts about this.

This post was modified by Jonathan Aizen on 2003-05-29 13:59:36

Reply to this post
Reply [edit]

Poster: woostahDave Date: May 29, 2003 6:19am
Forum: etree Subject: Re: Batting Averages revisited

i imagine this would affect batting averages because when there are many shows for a band, people are likely to shop around, view the details of many shows and then decide which ones to download. if there are only a couple of shows for a band i am interested in, i am going to download most of them. i do not think this invalidates the statistic. the point is to highlight some interesting/ unusual acts.

This post was modified by woostahDave on 2003-05-29 13:19:05

Reply to this post
Reply [edit]

Poster: kwaved Date: May 29, 2003 6:02am
Forum: etree Subject: Re: Batting Averages revisited

i imagine this would affect batting averages because when there are many shows for a band, people are likely to shop around, view the details of many shows and then decide which ones to download.

Yes, this is precisely how and why I think the BA stat is skewed by having a small number of available shows for download. If a band has only 20 shows on the archive clearly the most popular of those will be downloaded and accessed more frequently then a given set of shows for a band with 200 or more shows in the collection thus having the effect of skewing the results. This says nothing of the fact that only http access results are reflected in the stat in the first place which also, IMO, creates a bias towards less computer sophisticated downloaders.

I guess part of my objection to any importance of this stat is that my particular favorite artists are seldom listed with high BA stats This makes me think of possibilities for customizing the archive.org homepage by creation of a "My Archive" type profile which lists only those bands I am interested in and those sections of the homepage that interest me ... definitely a big nut to crack but something worth thinking about.


Reply to this post
Reply [edit]

Poster: woostahDave Date: May 29, 2003 6:15am
Forum: etree Subject: Re: Batting Averages revisited

a good place to start with this would be the ability to mark certain bands and to be notified when a new show for that band is uploaded. i know the admins have plenty on their plate as it is, but i just wanted to throw this out as something to think about down the line.

Reply to this post
Reply [edit]

Poster: kwaved Date: May 29, 2003 6:21am
Forum: etree Subject: Re: Batting Averages revisited


Exactly --- The "My Archive" profile would list those bands that a person is interested in and then the stats sections on the home page would be filtered through this list.

A further enhancement to the profile would allow the selection of those stats sections that comprise the homepage itself ala "My Yahoo" ...

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 29, 2003 7:00am
Forum: etree Subject: Re: Batting Averages revisited

Yeah, that makes sense. The batting average notion is, on the outside, very simple, but does have some complications. This discussion is a good example.

You'd think that the BA for shows by a band with very few shows in the Archive would have very high BA's, but I think that's not entirely true. It would only hold true if that band were very popular, yet had few shows. The thing is, if the band isn't popular, the visit count for each show will be low, and as such the batting average will be reduced more by the "small sample size penalty" (BA = downloads/visits - constant/sqrt(visits)). See what I'm saying? It's raw BA will be high, but it's computed BA will be low unless there are many visits.

At the same time, the batting average computation is supposed to help the underdog, so some such shows will end up there, but I think that's not what is being discussed here. This leads into part of what Alan was talking about, which is "my favorite bands are never up there" - chances are that your favorite bands can't compete with the more popular bands, who are affected less by the sample size penalty. The research we've done shows that this adjustment really does provide a reasonable way to rank items in the Archive. On the other hand, if the shows by your favorite bands have very high visit counts (i.e. they're "more popular" than those in the top BA list), yet are not in the top BA list, then they're just not downloaded as much (in terms of frequency).

As for the "My Archive" idea - I don't see it replacing the LMA landing page, but it might become a secondary page. It is no simple task though - regardless, it's possible I'll be able to incorporate in rev2 (not sure yet when that'll happen, but it'll be a big time upgrade).

Jon

This post was modified by Jonathan Aizen on 2003-05-29 14:00:39

Reply to this post
Reply [edit]

Poster: woostahDave Date: May 29, 2003 7:11am
Forum: etree Subject: Re: Batting Averages revisited

jon, i am more interested in seeing how many times a show has been downloaded rather than its batting average. i did request this before, but since it should be fairly simple to implement i am just reiterating my request here. i would really like to see full stats when viewing a band's list of shows. currently it just shows page views i think.

Reply to this post
Reply [edit]

Poster: daveg Date: May 29, 2003 10:55am
Forum: etree Subject: Re: Batting Averages revisited

jon, i am more interested in seeing how many times a show has been downloaded rather than its batting average. i did request this before, but since it should be fairly simple to implement i am just reiterating my request here. i would really like to see full stats when viewing a band's list of shows. currently it just shows page views i think.
Hear hear! Never really understood this BA thing anyway...a totally useless statistic to me since it is largely subjective IMO (ducks), but hey- I'm the weird one anyway.

This post was modified by daveg on 2003-05-29 17:55:32

Reply to this post
Reply [edit]

Poster: kwaved Date: May 29, 2003 11:41am
Forum: etree Subject: Re: Batting Averages revisited

While I too do not particularly value the BA stat nor, for that matter, like its prominence on the archive.org homepage, I would still love to hear just how it is that the BA stat is subjective.

This post was modified by kwaved on 2003-05-29 18:41:23

Reply to this post
Reply [edit]

Poster: daveg Date: May 29, 2003 12:12pm
Forum: etree Subject: Re: Batting Averages revisited

While I too do not particularly value the BA stat nor, for that matter, like its prominence on the archive.org homepage, I would still love to hear just how it is that the BA stat is subjective.


Yeah-my bad. Actually should have said that it is a stat that really says nothing in determining how "good" a performance is, which IMO, is the bottom line of my d/ling stuff here. Would rather have a crappy recording of a great performance than a great recording of a poor performance...

This post was modified by daveg on 2003-05-29 19:12:18

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 29, 2003 3:31pm
Forum: etree Subject: Re: Batting Averages revisited

Would you prefer the old "most downloaded" (which is inaccurate anyway, meaning the show whose download link was clicked the most)? With that you get a static list of 5 shows that are the most downloaded and always will be because they're on that list.

How do you propose we present you with the list of "recordings that sound good" in an automatic fashion?

This post was modified by Jonathan Aizen on 2003-05-29 22:31:48

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffDiana Hamilton Date: May 30, 2003 1:22am
Forum: etree Subject: Re: Batting Averages revisited

How do you propose we present you with the list of "recordings that sound good" in an automatic fashion?

I sure hope no one proposes, "The ones with the most review stars." (Review ranking really suffers from the "it's all good dude" effect.) :P

Reply to this post
Reply [edit]

Poster: kwaved Date: May 30, 2003 1:51am
Forum: etree Subject: Reviews

(Review ranking really suffers from the "it's all good dude" effect.)

{LOL} Yep it sure does but still the reviews feature definitely drives traffic to particular shows. Most intelligent folks can read & write reviews that are, ehem, not "all good" and thereby greatly influence views & downloads.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 30, 2003 2:53am
Forum: etree Subject: Re: Reviews

We've spent some time thinking about an automated way of assigning a "review impact rating" - which might help illuminate well-thought-out reviews. Kind of like at Amazon where you can say "I found/didn't find this review helpful" - except this would be automatic.

More on your other post a little later (busy packing for my impending cross-country move). I really like the ideas you presented in that post - I will definitely bring up them up at our next research meeting.

Reply to this post
Reply [edit]

Poster: kwaved Date: May 30, 2003 3:48am
Forum: etree Subject: Re: Reviews

I really like the ideas you presented in that post - I will definitely bring up them up at our next research meeting.

Cool -- thanks Jon for all the amazing work you do for this incredible resource I am very happy and proud to be a part of it.

Reply to this post
Reply [edit]

Poster: daveg Date: May 30, 2003 7:47am
Forum: etree Subject: Re: Batting Averages revisited

Would you prefer the old "most downloaded" (which is inaccurate anyway, meaning the show whose download link was clicked the most)? With that you get a static list of 5 shows that are the most downloaded and always will be because they're on that list.

I would indeed...except refresh/reset the list on occasion to prevent a skewed cumulative statistic. Maybe a "most d/led this month" list?


How do you propose we present you with the list of "recordings that sound good" in an automatic fashion?

A bit hard, I'll admit, then again, I'm satisfied with 256K encoded MP3s in many cases. No answer here...

This post was modified by daveg on 2003-05-30 14:47:06

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 30, 2003 9:08am
Forum: etree Subject: Re: Batting Averages revisited

I would indeed...

Well, that's still on all the pages, just scroll down a bit.

Monthly stats a bit hard to implement - it means that every single visit needs to be recorded in the database with the time of visit. Right now a count is just incremented.

This post was modified by Jonathan Aizen on 2003-05-30 16:08:20

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 29, 2003 7:26am
Forum: etree Subject: Re: Batting Averages revisited

We don't show "page views" anywhere. We show "directory views" which is the only other metric we have for shows.

Since we don't show # of downloads anywhere, what do you mean by "full stats"? I'm happy to accommodate, but I'm not sure I understand what you're asking for.

This post was modified by Jonathan Aizen on 2003-05-29 14:26:24

Reply to this post
Reply [edit]

Poster: woostahDave Date: May 29, 2003 8:39am
Forum: etree Subject: Re: Batting Averages revisited

for example
http://www.archive.org/audio/etreelisting-browse.php?collection=etree&cat=Club%20d%27Elf%3A%202002

displays directory views. i would like to see number of downloads and (for the mathematically impaired) batting average.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 29, 2003 9:05am
Forum: etree Subject: Re: Batting Averages revisited

I think you're a bit confused. "Directory views" is not "number of visits" as used in the batting average calculation. I repeat: we do not have "download" counts for concerts (# of downloads via HTTP, FTP, and so on) and that is why we don't show that stat. The "directory visits" count is the number of times people clicked the link on the details page to see the directory via HTTP or to download the show via IAFM.

Batting average can be displayed on the browse page. Please add it to the feature request list.

Reply to this post
Reply [edit]

Poster: woostahDave Date: May 29, 2003 9:13am
Forum: etree Subject: Re: Batting Averages revisited

so now i am very confused. what is "number of visits" if it is not "directory views"?

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 29, 2003 3:29pm
Forum: etree Subject: Re: Batting Averages revisited

"number of visits" would be the number of times the details page was accessed. "number of directory visits" is the number of times users clicked the link on the details page.

Reply to this post
Reply [edit]

Poster: kwaved Date: May 29, 2003 12:34pm
Forum: etree Subject: Re: Batting Averages revisited

You'd think that the BA for shows by a band with very few shows in the Archive would have very high BA's, but I think that's not entirely true. It would only hold true if that band were very popular, yet had few shows.

Glen Phillips is a perfect example -- only 23 shows. I dunno anything about this artist but a quick look at the collection indicates that the 8-18-02 show (#1 BA show) has 5280 views while all the 11 other 2002 shows for him total only 2135 views combined. Apparently this one show is immensely popular and the artist himself, whoever he is, seems to be quite popular as well. Not to mention that the 8-18-02 recording seems (based on the Royer mic used) to be of high quality while a couple of the other shows I checked were apparently of lesser quality (again based on the mics used). I'm not sure what the BA of the other shows are but for whatever reason this one show gets alot of attention.

At the same time, the batting average computation is supposed to help the underdog, so some such shows will end up there ... chances are that your favorite bands can't compete with the more popular bands, who are affected less by the sample size penalty.

How does the "small sample size penalty" help the underdog ? If anything that hurts the little guy. It seems to me that after some number of visits the sample size penalty should no longer apply.

The research we've done shows that this adjustment really does provide a reasonable way to rank items in the Archive.

I'm not sure I agree with you on this point. For instance a band like, that's right you guessed it SKB, has over 200 sources for 2003 alone and therefore the pool downloads are spread over a much bigger collection -- after all we do have constraints here such as bandwidth --- not to mention that Kimock is surely less "popular" than frat-rock bands like OAR and Jack Johnson and appeals to a much smaller demographic.

Its a bit frustrating because, as you mentioned Jon, I would much rather never see any information about bands that I am not interested in nor will ever be interested in and seeing these bands listed day after day right at the top of the landing page --- well that surely isn't useful. Furthermore it probably further skews the stat because if folks see hey this is the top BA show -- let's check it out --- that surely doesn't help the underdog in anyway. Of course things like the Spotlight and Staff Picks are, IMO, much more interesting at least to me personally.

On the other hand, if the shows by your favorite bands have very high visit counts (i.e. they're "more popular" than those in the top BA list), yet are not in the top BA list, then they're just not downloaded as much (in terms of frequency).

Again with the SKB example the most "popular" show for 2003 has about 800 views -- compared to 5280 for the GP show mentioned earlier. This still means that the small sample pentalty is 3 times higher (approx) for that SKB show than the GP show --- and frankly that seems quite inequitable when weighed with the size of the collection for a given artist, total views etc. Of course -- I have no idea what the constant in your equation is --- perhaps it is weighted with some of these additional factors ??

As for the "My Archive" idea - I don't see it replacing the LMA landing page, but it might become a secondary page. It is no simple task though - regardless, it's possible I'll be able to incorporate in rev2 (not sure yet when that'll happen, but it'll be a big time upgrade).

Agreed it is a big one, I would not suggest replacing the landing page --- but, like Yahoo and other websites, the user can setup a profile and create a customized view for themselves. That would be great and then I would never have to see all these frat-rock bands listed ever again :o)

This post was modified by kwaved on 2003-05-29 19:34:56

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffDiana Hamilton Date: May 30, 2003 2:21am
Forum: etree Subject: Re: Glen Phillips' special show- reconsider hosting?

Glen Phillips is a perfect example -- only 23 shows. I dunno anything about this artist but a quick look at the collection indicates that the 8-18-02 show (#1 BA show) has 5280 views while all the 11 other 2002 shows for him total only 2135 views combined. Apparently this one show is immensely popular and the artist himself, whoever he is, seems to be quite popular as well. Not to mention that the 8-18-02 recording seems (based on the Royer mic used) to be of high quality while a couple of the other shows I checked were apparently of lesser quality (again based on the mics used). I'm not sure what the BA of the other shows are but for whatever reason this one show gets alot of attention.

You missed something much more special about 8/18/02 than the mics used: Nickel Creek sat in with him for a number of tunes! Glen even mentioned the experience in a diary entry at his site.

There is also a sample mp3 from the date posted on his website. Perhaps that also piques his fans' interest in this particular gig.

In addition, you could well have word of mouth among NC fans- maybe they're driving up the BA, not the GP fans. (They may even be more popular too, who knows- they won a Grammy recently.)

...and a quick check shows the night was also mentioned favorably in Nickel Creek's online journal too.

Possible party pooping ahead
...Uh, hmm actually, looking further- NC is still in pending section here and their policy as posted here makes me wonder if they'd even be up for archiving. Should 8/18/02 even be here yet?

Anyway...
whoever he is

Thanks to IAFM, it is now so easy to painlessly queue up stuff from folks you never heard of. Maybe check this mystery man's music out, instead of just what mics people used on him? So you'll drive up his BA a notch, c'est la vie. ;)

This post was modified by hamilton on 2003-05-30 09:21:58

Reply to this post
Reply [edit]

Poster: kwaved Date: May 30, 2003 2:52am
Forum: etree Subject: Re: Glen Phillips' special show- reconsider hosting?

You missed something much more special about 8/18/02 than the mics used: Nickel Creek sat in with him for a number of tunes! Glen even mentioned the experience in a diary entry at his site.

Personally speaking I have no interest in the music of NC and I bet I that GP would be of little interest as well -- I did go to his website yesterday and I learned he is one of the Toad the Wet Sprocket dudes --- definitely not of interest to me at all. But you are right of course given the NC sit-in I could see how this show would be very popular even though of no interest to me whatsoever.

Should 8/18/02 even be here yet?

About that given that it was a GP show doesn't that mitigate the situtation? I mean if David Lindley sat-in with Kimock I bet the show would still be recorded and available even though Lindley is very much anti-taping.

Maybe check this mystery man's music out, instead of just what mics people used on him?

No thanks Diana {LOL} As for the mic bias I am definitely one of those people that early on in my trading days would grab shows based on the show/performance first and recording quality second but as I obtained show after show that were basically unlistenable or listened to only once I have developed a serious bias towards high quality recordings. Therefore mics, mic placement, pre-amp, recordist, transfer, DAE are all very important to me when deciding which shows to download and listen to.

This post was modified by kwaved on 2003-05-30 09:52:10

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffDiana Hamilton Date: May 30, 2003 3:48am
Forum: etree Subject: Re: Glen Phillips' special show- reconsider hosting?

Should 8/18/02 even be here yet?

About that given that it was a GP show doesn't that mitigate the situtation? I mean if David Lindley sat-in with Kimock I bet the show would still be recorded and available even though Lindley is very much anti-taping.

I don't personally know how Kimock would handle that example, but in some cases trade-friendly artists to defer to trade-unfriendly guests or co-bills and call for no taping in those cases. Examples: Grateful Dead backing Dylan; Phil Lesh co-bill with Dylan; MMW co-bill with Zorn; Tim O'Brien with Steve Earle as guest (it even specifies that example in Tim's policy IIRC).

On semi-related note, Tim Reynolds said yes to archive but Dave Matthews changed his mind- and so we don't have Dave & Tim shows up anymore, only just Tim's...

Reply to this post
Reply [edit]

Poster: kwaved Date: May 30, 2003 4:04am
Forum: etree Subject: Re: Glen Phillips' special show- reconsider hosting?

Oh I see well then I guess it is worth checking out with Glen's people ?

Reply to this post
Reply [edit]

Poster: Erich Date: May 30, 2003 6:49am
Forum: etree Subject: Re: Glen Phillips' special show- reconsider hosting?


Originaly posted by hamilton
On semi-related note, Tim Reynolds said yes to archive but Dave Matthews changed his mind- and so we don't have Dave & Tim shows up anymore, only just Tim's...

I was going to use that example as well. But what if it was different? in essence, the D+T shows are Dave Matthews plus special guest Tim Reynolds. Thats even how tim is paid for the show. But what if Dave guested at one of Tims shows for a song or two? this will NEVER happen, but similar situations on the archive have (Bella Fleck + The Flectones had dave guest a few times, i think Soulive did once). How does that get treated?

This post was modified by Erich on 2003-05-30 13:49:12

Reply to this post
Reply [edit]

Poster: kwaved Date: May 30, 2003 9:11am
Forum: etree Subject: Re: Glen Phillips' special show- reconsider hosting?

But what if Dave guested at one of Tims shows for a song or two? this will NEVER happen, but similar situations on the archive have (Bella Fleck + The Flectones had dave guest a few times, i think Soulive did once). How does that get treated?

My guess is that if tapes where made that circulate then the de-facto rules of the primary artist prevails. For instance if a guest (like Dylan) did not want taping during his Dead touring days then taping was not permitted. Of course tapes still exist and I believe the archive would have to persue a course of due diligence to ensure such "stealth" or "illegal" recordings do not make it to the collection.

I wonder if the GP with NC falls in that category?

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 30, 2003 9:58am
Forum: etree Subject: Re: Glen Phillips' special show- reconsider hosting?

I'll chime in here with my opinion: if the show is billed under an artist that has granted permission and guests join the artist on stage, that has no effect on whether or not the show should be hosted. If the artist leaves the stage and the guests remain playing their own music, that's a different story.

Reply to this post
Reply [edit]

Poster: Erich Date: May 30, 2003 10:32am
Forum: etree Subject: Re: Glen Phillips' special show- reconsider hosting?


Originaly posted by Jonathan Aizen
I'll chime in here with my opinion: if the show is billed under an artist that has granted permission and guests join the artist on stage, that has no effect on whether or not the show should be hosted. If the artist leaves the stage and the guests remain playing their own music, that's a different story.


How does that affect:

1> Superjam, in case theres a situation where durring the jam an un-archive friendly artist starts to jam their own tuns before going into the next jam?

2> the oposite case with your scenario, where the non-archive friendly artist leaves the stage and and the archive friendly guest plays their own music?

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffDiana Hamilton Date: May 30, 2003 11:32am
Forum: etree Subject: Re: Superjam hypothetical

Superjam

The only "Superjam" we have permission for here is for the Bonnaroo Superjam, a just-for-that-festival assemblage. Perm was granted by the Bonnaroo festival people- who I gather had a taping/trading friendly policy covering the whole festival (all artists OK with it at least for that festival).

Surely a unique situation, as all other "Superjams" seem to be- or is there a "real" band called "Superjam" that I'm not aware of? We don't have permission for that if so.

...BTW thanks Jon for clarifying GP sit-in thing above! All safe for Glen's show, eh- which I plan to check out next week after all the buzz. :)

This post was modified by hamilton on 2003-05-30 18:32:35

Reply to this post
Reply [edit]

Poster: TeamRoyer Date: May 30, 2003 11:13am
Forum: etree Subject: Re: Glen Phillips' special show- reconsider hosting?

As the taper of the show I can chime in with the fact Nickel Creek has given authorization to host their co-billed shows and it is even posted on the archive see the following:
http://etree06.archive.org/0/audio/glen2002-02-05.shnf/Nickel%20Creek%20Permission.txt
Prior to this, and prior to uploading the show I personally talked with Sara of Nickel Creek as well as Glen and his managment about hosting this specific show and they were very pleased with the idea

I hope this helps =)

Peace, Chris

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffDiana Hamilton Date: May 30, 2003 11:54am
Forum: etree Subject: Re: Glen Phillips' special show- quite OK!

Wow Chris, that gets the prize for Most Useful Link of the Day. Nothing hypothetical about that. Thanks to you and Jeff, as well as the gracious artists! :)

...Txt file is now pointed to on both NC and GP policy pages here- that should wrap this up.

This post was modified by hamilton on 2003-05-30 18:54:32

Reply to this post
Reply [edit]

Poster: TeamRoyer Date: May 30, 2003 11:30pm
Forum: etree Subject: Re: Glen Phillips' special show- quite OK!

Thanks for getting that added to the policies Diana; on a related aside did you ever get the email I sent you in regards to Glen's approval for hosting of Toad The Wet Sprocket shows? It seems like it got lost in your inbox but I didn't want to send it more than once if you are just taking time to sift through them, let me know =) Thanks, Chris

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffDiana Hamilton Date: May 31, 2003 11:48am
Forum: etree Subject: Re: Toad the Wet Sprocket- upload away!

Yikes, thanks for mentioning that, it was tucked into a neglected swath from March, had to bring it up by keyword search. The way things are going I would have noticed it by August. :O

So, all activated now, my apologies for the delay!

Reply to this post
Reply [edit]

Poster: kwaved Date: May 29, 2003 12:40pm
Forum: etree Subject: Sample Size Correction

I think the problem I am having with the BA stat might lie in th sample size correction. You have really piqued my interest Jon -- what is this constant you mentioned ? Is it a constant for each artist or a collection-wide value ?

It seems to me that an artist level constant might help take into account some of the other factors in play here --- perhaps by valuing the total views by artist, total shows by artist, and of course the all important "frat-rock quotient" (just kidding greeks), a more equitable correction could be applied to the BA formula?

Also perhaps time is a reasonable element to consider here --- if a show gets 100 views in 1 month while another gets 200 views in 2 years that would seem to be a factor worth considering.

The bottom line for me is that BA stat does not do justice to the full scope of trends and tendencies that can be culled from the downloading habits of the users of the collection and the collection itself and in fact tends to "reward" a small segment and therefore does not adequatedly reflect the diversity of the collection and the interest that is shown by downloaders across all available materials.

This post was modified by kwaved on 2003-05-29 19:40:27

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 29, 2003 3:54pm
Forum: etree Subject: Re: Sample Size Correction

I'll respond to both your posts here.

Constant is 1.1, for every item in the Archive. You can do the math to see what effect changing it would have. We cannot set it differently for different bands as that'd defeat the whole purpose - then it wouldn't be a constant and would lose its meaning.

How does the "small sample size penalty" help the underdog ? If anything that hurts the little guy. It seems to me that after some number of visits the sample size penalty should no longer apply.

The small sample size doesn't help the underdog, but batting averages do (as opposed to sheer download counts). The idea is that a show with relatively few visits yet many downloads (even if the number of visits is small and is penalized), will have a very high batting average. Take for example a show with 100 visits and 99 downloads. It's BA will be 88% (99/100 - 1.1/sqrt(100)). Now take a show with 1000 visits and 500 downloads. Its BA will be 47%. If we were using sheer download counts, you'd never see the first show, and you'd never stop seeing the second show on the top list.

The reason we have the small sample size correction is so that we can fairly rank items side by side. It wouldn't be fair to say an item that was downloaded 5 times out of 10 is as interesting as an item that was downloaded 5000 out of 10000 times. Makes sense, right?

seeing these bands listed day after day right at the top of the landing page

Granted the GP show is an exception - it has remained in the top 5 for a long time (though do note that it hasn't been #1 all along). The statistics I collected show that in March and April the top 5 BA list changed almost every single day (order independent, just membership!), whereas the top 5 download list changed less then 8 days out of each month!

Furthermore it probably further skews the stat because if folks see hey this is the top BA show -- let's check it out

This is exactly why the batting average is good. If lots of people click the show because it is in the top BA list, it will drive it's BA down unless the users find it valuable enough to download.

I think your the aforementioned problems with the BA statistic is the same problem with all the statistics - you don't like what what other people find interesting, on a global level. You'd be more interested in what shows and souurces other Kimock fans like, and that's completely reasonable. That kind of work will happen some day.

Of course things like the Spotlight and Staff Picks are, IMO, much more interesting at least to me personally.

Interesting that you say this - because these items are the least popular with everyone else (if you measure popularity in terms of the percentage of people who download them). Spotlight and Pick List mentions send tons of traffic to the items, yet virtually no one downloads the show when they come from there.

The bottom line for me is that BA stat does not do justice to the full scope of trends and tendencies

So if this is so, I'd like to hear the alternative, on the global scale (not the local scale, which tailors directly to your own interests - like a My Archive, or recommendations from one show to another). I doubt you'd say that the top 5 download list is better...

Anyway, your input is very very valuable and I appreciate it.

This post was modified by Jonathan Aizen on 2003-05-29 22:54:25

Reply to this post
Reply [edit]

Poster: kwaved Date: May 30, 2003 1:19am
Forum: etree Subject: Re: Sample Size Correction

It wouldn't be fair to say an item that was downloaded 5 times out of 10 is as interesting as an item that was downloaded 5000 out of 10000 times. Makes sense, right?

Well perhaps it does make sense for a show with only 10 views --- but what of a show with 1000 views versus one with 10000 views? The sample size correction in these cases are 3.47% and 1.1% respectively --- frankly I do not think that a 2.3% additional penalty makes sense at all. Specifically in the case where one artist (the one with less views) has an order of magnitude more shows available for download. Perhaps an adjustment based on collection size AND number of views makes more sense.

For instance artist 1 has 10 shows and artist 2 has 100 shows available and show1 for artist 1 has 10000 views while show2 for artist 2 has 1000 views. If multiplied these numbers before taking the sqrt then we (perhaps) have an even playing field, ie the sample size correction would be the same for both show1 & show2. I guess my point being that collection size (number of shows for an artist) does play a role here and should be taken into account somehow.

For somewhat less popular artists the most popular shows will never reach the download strength of the most popular artists yet it is still significant that show 1 vs show 2 for a less popular artist is being downloaded more frequently ... my poiunt being that the small sample penalty seems unfair as currently implemented.

This is exactly why the batting average is good. If lots of people click the show because it is in the top BA list, it will drive it's BA down unless the users find it valuable enough to download.

Yes and no, of course I see your point but if the collection size for an artist remains low (ie fewer show choices) I think it is clear that the top BA shows would continue to be so while for artists with larger collections the most popular shows probably shift around quite a bit more often.

Spotlight and Pick List mentions send tons of traffic to the items, yet virtually no one downloads the show when they come from there.

Interesting, clearly the spotlight shows do send more traffic to those items and I bet because of that there are more downloads as well. Take the current spotlight show --- traffic has definitely increased on the 4-20-2 SKB show and I did notice that a few folks were grabbing it as well (based on the system status page) ...

So if this is so, I'd like to hear the alternative, on the global scale (not the local scale, which tailors directly to your own interests

I think several factors are not being represented by the current BA stat.

Artist Collection Size - I believe that artists with a larger number of shows available will always have a fewer number of views & downloads on a show by show basis. And the sample size penalty unfairly skews the stat. Perhaps the sample size penalty can be changed to take this into consideration.

FTP vs HTTP/IAFM - I think it is clear that the demographic and computer literacy of downloaders varies greatly from artist to artist with (in general) less sophisticated users tending to use HTTP/IAFM which again skews the BA statistic unfairly. --- I'm not sure this can be helped though.

Sample Size Correction - when would it (if ever) make sense to ignore this completely? Clearly it does its job for REALLY small samples but for shows that have the highest number of views for a given artist the penalty is still significant when compared to vastly more popular artists and that, IMO, renders the stat nearly as useless as "number of downloads"

This post was modified by kwaved on 2003-05-30 08:19:17

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 29, 2003 4:11pm
Forum: etree Subject: Re: Sample Size Correction

One more thing in defense of the batting average: it's timely. The recently popular moe. show from May 6th, came up into the BA list as it was temporarily of interest, then shot back out when users lost their interest. This is unique to the top BA list.

Anyway, not trying to be defensive here, but am trying to make it clear that it beats what we had before and that's why it's prominently placed. It is not self-reinforcing and there is something interesting about that.

I also think this conversation is particularly useful to have because it brings to light that users don't really seem to understand the way the batting average works, and it also provides very useful insight about where we could potentially focus our research in the future.

so thanks

This post was modified by Jonathan Aizen on 2003-05-29 23:11:17

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffDiana Hamilton Date: May 29, 2003 5:41am
Forum: etree Subject: Re: Batting Averages revisited

Possibly for some high-avg shows, fans or the bands themselves could have publicized direct links to the individual items in mailing lists or newsletters or official websites. Fans who see those links could tend to shoot right to those items and start downloading. I hypothesize that the "focused attention" would increase BA.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffchops11 Date: May 29, 2003 6:08am
Forum: etree Subject: Re: Batting Averages revisited

i agree and disagree with anything and everything said here on this topic. :) however, i'm also thinking Jon that the IAFM is going to make this stat a little less truthful? being that a show that gets really popular is going to not be recorded in the BA calculations?

cheers
ed

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 29, 2003 6:50am
Forum: etree Subject: Re: Batting Averages revisited

IAFM should help, not hinder, the accuracy of the stat. Since the only way to download an item through IAFM is to click on the link on the details page, the download will always get tracked (as opposed to when a user logs into the FTP server directly without clicking the link on the details page).

Anyway though, I imagine that most people are still using FTP and HTTP, not IAFM.

In truth though, the batting average is just mean to be an alternative to the "Most Downloaded" list, one which is more volatile.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffchops11 Date: May 29, 2003 7:34am
Forum: etree Subject: Re: Batting Averages revisited

well in that case good work jon :)

cheers
ed